Managing software environments with

The problem
![]()
Full reproducibility requires the possibility to recreate the system that was originally used to generate the results.
Conda is a package, dependency, and environment manager
- package: any type of program (e.g. multiqc, snakemake etc.)
flowchart LR
multiqc(multiqc)
Conda is a package, dependency, and environment manager
- package: any type of program (e.g. multiqc, snakemake etc.)
- dependency: other software required by a package
flowchart LR
multiqc(multiqc) -.-> numpy(numpy)
multiqc -.-> matplotlib(matplotlib)
multiqc -.-> python(python)
Conda is a package, dependency, and environment manager
- package: any type of program (e.g. multiqc, snakemake etc.)
- dependency: other software required by a package
- dependencies in turn have their own dependencies
flowchart LR
multiqc(multiqc) -.-> numpy(numpy)
multiqc -.-> matplotlib(matplotlib)
multiqc -.-> python(python)
matplotlib -.-> python
matplotlib -.-> numpy
matplotlib -.-> fonttools(fonttools)
numpy -.-> python
numpy -.-> libcxx(libcxx)
Conda is a package, dependency, and environment manager
- package: any type of program (e.g. multiqc, snakemake etc.)
- dependency: other software required by a package
- dependencies in turn have their own dependencies
- environment: a distinct collection of packages
flowchart LR
subgraph environment
style environment fill:#00000000, stroke-width:1px
direction LR
multiqc(multiqc) -.-> numpy(numpy)
multiqc -.-> matplotlib(matplotlib)
multiqc -.-> python(python)
matplotlib -.-> python
matplotlib -.-> numpy
matplotlib -.-> fonttools(fonttools)
numpy-.->python
numpy -.-> libcxx(libcxx)
end
Conda channels
Channels are remote directories containing packages
flowchart TD
ch1[(channel1)] --- p1[package1]
ch1[(channel1)] --- p2[package2]
ch1[(channel1)] --- p3[package3]
ch2[(channel2)] --- p4[package4]
ch2[(channel2)] --- p5[package5]
ch2[(channel2)] --- p6[package6]
Conda channels
Two common examples are:
- bioconda (a channel specializing in bioinformatics software)
- conda-forge (a community-led channel made up of thousands of contributors)
flowchart TD
ch1[(bioconda)] --- p1[bowtie2]
ch1[(bioconda)] --- p2[fastqc]
ch1[(bioconda)] --- p3[snakemake]
ch2[(conda-forge)] --- p4[numpy]
ch2[(conda-forge)] --- p5[jupyter]
ch2[(conda-forge)] --- p6[wget]
Conda channels
Two common examples are:
- bioconda (a channel specializing in bioinformatics software)
- conda-forge (a community-led channel made up of thousands of contributors)
flowchart TD
ch1[(bioconda)] --- p1[bowtie2]
ch1[(bioconda)] --- p2[fastqc]
ch1[(bioconda)] --- p3[snakemake]
ch2[(conda-forge)] --- p4[numpy]
ch2[(conda-forge)] --- p5[jupyter]
ch2[(conda-forge)] --- p6[wget]
p5 -.-> l1([conda install -c conda-forge -c bioconda snakemake jupyter])
p3 -.-> l1
Defining and sharing environments
Define a Conda environment in an environment.yml file:
channels:
- conda-forge
- bioconda
dependencies:
- fastqc=0.11
- sra-tools=2.8
- snakemake=4.3.0
- multiqc=1.3
- bowtie2=2.3
- samtools=1.6
- htseq=0.9
- graphviz=2.38.0
Defining and sharing environments
- Update an existing environment:
conda env update -f environment.yml
Defining and sharing environments
- Update an existing environment:
conda env update -f environment.yml
- Export environment (including all dependencies) to a file:
conda env export > environment.yml
Defining and sharing environments
- Update an existing environment:
conda env update -f environment.yml
- Export environment (including all dependencies) to a file:
conda env export > environment.yml
- Export historical environment (only packages explicitly installed):
conda env export --from-history > environment.yml
Conda, Anaconda, Miniconda, Mamba…
- Conda: The package manager itself, written in python
- Mamba: A faster re-implementation of Conda (written in C++)
- Anaconda:
- An installer for Conda containing over 7,500 open-source packages
- A cloud service where conda packages are hosted (anaconda.com)
- Miniconda: A minimal installer for Conda, containing only the most necessary packages to get started
- Mambaforge: Installer with Mamba in the base environment, pre-configured to use the conda-forge channel
Mamba vs. Conda
Mamba is a faster re-implementation of Conda in C++.
Mamba vs. Conda
Mamba is a faster re-implementation of Conda in C++.
- Install mamba with conda:
conda install -n base -c conda-forge mamba (or see the documentation for how to do a fresh install)
Mamba vs. Conda
Mamba is a faster re-implementation of Conda in C++.
- Install mamba with conda:
conda install -n base -c conda-forge mamba (or see the documentation for how to do a fresh install)
- Simply replace
conda with mamba on the command line:
mamba env create --name project_a -f environment.yml
mamba env update -f environment.yml
mamba env export > environment-full.yml
mamba env export --from-history > environment-history.yml
Questions?